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Abstract 

The bootstrap method of resampling can be useful in estimating the replicability of study 
results. The bootstrap procedure creates a mock population from a given sample of data 
from which multiple samples are then drawn. The method extends the usefulness of the 
jackknife procedure as it allows for computation of a given statistic across a maximal 
number of fluctuations in the original sample from which the bootstrap data are based. A 
sample set of data is used to demonstrate the bootstrap procedure for a univariate 
multiple regression analysis. 
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The Beginner’s Guide to the Bootstrap Method of Resampling 

Result replicability, along with statistical significance and result importance, is 
one of the essential elements of the “research triumvirate” (Carver, 1987; Tukey, 1969). 
The whole purpose of taking the trouble and expense to collect samples and data is to 
make inferences about a particular population of interest. If study results will not 
replicate or generalize to the population of interest then they are of limited value 
(Thompson, 1992; Tukey, 1969). Unfortunately, it is often impractical, if not 
impossible, to replicate studies conducted in the social and behavioral sciences. It can 
take years to reproduce the conditions of the original study and, for many researchers, 
replication is cost prohibitive. One must also consider the timeliness of getting results 
to press. It is generally not in the interest of science to delay publication of important 
results for years until the entire study can be repeated with precision. Fortunately, 
growing accessibility to inexpensive computer power has encouraged statisticians to 
explore internal replication procedures as alternatives to complete study replication. 

One of the most useful internal replication methods gaining popularity among 
statisticians is the bootstrap method of resampling. 

The bootstrap method of resampling, invented in the 1970s by Bradley Efron, is a 
computing-intensive procedure that simulates a population using the original sample set 
of data (Chernick, 1999). The simulated population is used to make judgements 
regarding the statistical analyses performed on the original sample set of data. Instead of 
relying of the theoretical sampling distributions for certain sample sizes, the bootstrap 
procedure creates an empirical distribution for a sample statistic through repeated 



BOOTSTRAP-4 



sampling with replacement from the original sample (Chernick, 1999; Diaconis & Efron, 
1983; Lunneborg, 1992; Thompson, 1992). 

The bootstrap is conceptually quite simple and can be applied to a wide range of 
statistics. Even an inexperienced researcher can follow the logic: A random sample of 
data is drawn from a population of interest. Sample statistics are computed. The original 
sample is then copied many times to create a pseudo-population. Many random samples 
(ot size equal to the original sample) are then drawn (with replacement) from this pseudo- 
population. Statistics are computed for each sample in order to create a distribution of 
each sample statistic. Statisticians (Chernick, 1999; Diaconis & Efron, 1983; Lunneborg, 
1987) have empirically demonstrated through computer simulations that the sampling 
distribution created from the bootstrap samples (F*) mirrors the true sampling 
distribution of the statistic (F). The number of replications required to create the “ideal” 
bootstrap sampling distribution (F^) is n n , where n is original sample size (Fox, 1997). 
However, Efron and Diaconis (1983) have demonstrated that one can approach the ideal 
F* distribution with as few as 100 replications. 1 

The bootstrap distributions can be used for two purposes (Hinkle & Winstead, 
1990; Lunneborg, 1997; Thompson, 1992). First, means and standard deviations of the 
bootstrap sample statistics may be computed and used to create confidence intervals 
around the original sample statistics. Thus, the researcher has some evidence of the 
stability of his or her results over many different configurations of samples. Second, the 
empirically created distributions can be used to make decisions regarding statistical 
significance when theoretical sampling distributions are not available. 
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One of the biggest advantages of the bootstrap method is that it does not require 
the assumption that the standard errors in the observed values be randomly and normally 
distributed in order to work effectively (Chernick, 1999; Hinkle & Winstead, 1990; 
Lunneborg, 1987; Reinhardt, 1992; Thompson, 1992). This assumption of normality is 
often required before classical statistical analysis can proceed (Lunneborg, 1987). 

Instead, the bootstrap method creates its own empirical distribution from the data at hand. 

This is an important discovery considering that most classical statistical tests 
make normality assumptions about the sampling distribution. Often, this assumption is 
tenuous for data collected in the social and behavioral sciences (Bickel & Freedman, 
1981). Until Efron invented the bootstrap, the researcher had two choices for making 
statistical decisions (Lunneborg, 1987). One was to assume he or she knew everything 
about the form of the distribution and use parametric techniques that utilize the 
theoretical sampling distributions. The other was to assume one knew nothing about the 
form of the distribution and use nonparametric techniques. The bootstrap finally offers a 
compromise between “everything and nothing”. 

Another advantage of the bootstrap procedure is that it avoids some of the 
problems associated with statistical significance and sample size (Fan, 1994; Thompson, 
1992). It has been shown through simulated experiments that statistical significance is a 
function of sample size (Lane, 1999; Morrison & Hinkle, 1970; Thompson, 1998). If the 
sample size is large enough, statistical significance is assured, regardless of the effect 
size. The bootstrap can be useful in research areas where it is difficult to obtain large 
samples (e.g., special education). If a study’s results are not statistically significant even 

1 As few as 100 replications may be sufficient for descriptive statistics such as means and standard 
deviations. However, for more complicated procedures or for creating confidence intervals, it has been 
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though the effect size is noteworthy, the bootstrap can give some estimate of result 
replicability. This could provide a basis for publication of important research that may be 
confounded by small sample size. 

Given the importance of result replicability and the availability of internal 
replication procedures, it is surprising that many researchers fail to address the issue 
(Giroir, 1989; Reinhardt, 1992). Even though replication is one of the basic principles of 
research, surveys of professional journals in the behavioral sciences find that researchers 
pay little attention to this principle and fail to evaluate it in inappropriate ways. One 
reason for the lapse is the widespread misuse of statistical significance tests (SSTs). 

Many researchers mistakenly assume that statistically significant results indicate the 
probability that results will replicate in future samples (Carver, 1978; Daniel, 1998; 
Thompson, 1998). Thus, there is a large population of researchers who feel it 
unnecessary to bother with internal replication. Many have illuminated the folly of this 
assumption (Carver 1978; Daniel 1998; Lane 1999; Thompson 1998). SSTs are 
predicated on a true null hypothesis in the population of interest. The researcher cannot 
make further inferences about probabilities in the population based on calculated g results 
from SSTs because the population was “forcibly” set at zero (null). The calculated g 
value speaks only to the probability of the sample results. 

Unfamiliarity with internal replication procedures is another reason why many 
researchers fail to address the issue. Many of those involved in educational research 
(author included) are neither expert statisticians nor computer prodigies. Although there 
are several software packages available for performing bootstrap procedures (e.g., S- 
Plus), most require user interface with the program syntax. Because these programs are 



suggested that as many as 1000 replications may be needed (Diaconis & Efron, 1983; Fox, 1997). 



